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Convolutional network from the original high-resolution image. The bone picture is the foundation of 
Deep learning the procedure. Our main objective is to categorize individuals by age using 


convolution neural network (CNN) classification models, such as the 
Xception and Mobile Net models. As a result, we have achieved results that 
are 90% and 94% accurate in classifying people by age using CNN models. 
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1. INTRODUCTION 

A person's skeletal and biological maturity can be determined by their bone age. Unlike chronological 
age, which is established by a person's birth date, this age is different. Paediatricians used to compare a child's 
bone age to their chronological age to diagnose conditions that make kids to shape their stature. These measures 
are helpful to evaluate how well these diseases’ treatments are working. Formula have also been developed for 
calculating a child's final adult height from figures for the child's bone age in healthy, normal children. The 
estimation of bone age is used to determine chronological age for the kids whose birth records are not available. 
A major issue in our region of the world is the lack of birth records. The hand and wrist bones' ossification 
pattern is frequently predictable and age-specific. By comparing the maturity of the hand and wrist bones, the 
standard age associated with normal ageing has been determined. 

The bone age study can assess how quickly or slowly a child's skeleton is developing, which can assist 
doctors in identifying diseases that either slow down or accelerate up physical development. Typically, doctors 
or paediatric endocrinologists will request this test. Identifying the age of death, birth date, year of death, and 
gender of unidentified human remains in the context of a criminal investigation might help detectives make the 
right identification out of a possible match. 
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An ultrasonography of the hand bones is being tested, but still not useful to demonstrate how much 
the hand bones have grown and evolved using a plain wrist radiography. In forensics and anthropology, 
determining the age is critical for the identification of unknown individuals or skeletal remains. The quality 
and quantity of the mortal remains, as well as the period between autopsy and death, environmental 
circumstances, and the structure of the corporeal remains or skeleton parts, are all important considerations in 
postmortem examination. It may also be affected by other case-specific considerations such as expenses, time, 
and equipment requirements. There are numerous techniques for estimating age when those important aspects 
are considered. Teeth can be used as a biological indicator of ageing. Teeth have a highly mineralized structure 
that protects them from postmortem decomposition and allows them to endure fires, alkalis, and acids. Even 
bone can deteriorate over time, but teeth can be kept for a long time and hence can be used for identification 
in disasters. 

Since the advent of social media and platforms, automatic age and gender classification has become 
increasingly crucial for a variety of applications. Even Although there have been significant performance 
advances recently for the closely related issue of facial recognition, the performance of existing techniques on 
actual photographs still trails well behind those results. In order to extract the attributes, the network needs first 
be trained on a lot of photos. Experimental validation on the industry benchmark Radiological Society of North 
America (RSNA) bone age, images of groups, and benchmarks shows that the use of attention processes 
enhances the robustness and accuracy of convolution neural networks (CNNs). For age or gender estimation 
jobs, the majority have used classification schemes like and others. To do this, proposed a simple convolutional 
net design that may be used even with a little training data. We compare the most recent methods to state-of- 
the-art techniques using the RSNA bone age standard for age and gender estimate. The ability to accurately 
determine age and gender of a person from media has a big advantage. 

However, the poor performance of CNN's members was brought on by the great diversity of facial 
photos found in the wild, such as those gathered from the internet. When individuals acknowledge that their 
performance has declined, they become unhappy. 

The fundamental aspect of the research is to find a new feed forward technique with attention 
mechanism to enhance the robustness of existing CNN. For the unrestricted face images with high resolution 
analysis, in order to locate the different patches of low resolution, by the recent success of attention 
mechanisms. Therefore, in addition to improving resolution, our strategy enables the network to give greater 
weight to the portions of the image that are least obscured or deformed, making the model more resistant to 
noise and distractions. We carefully compare the effectiveness of our attention pipeline to the most recent, 
cutting-edge CNNs that have been trained for facial recognition utilising age and gender recognition criteria. 
Applying standard CNNs to the RSNA images and images of group (IoG) datasets to determine the age and 
gender recognition results in improved performance compared to standard CNNs. Because RSNA and IoG are 
composed of unrestricted facial photos that were captured in the field, they serve as examples of how our model 
is able to recognise soft biometric features from facial images (age and gender). 


2. LITERATURE REVIEW 

A skeleton's bones age is examined using forensic anthropology. Ultrasonography and radiography 
are methods used in forensic anthropology. Thus, the Tanner Whitehouse method and the greulich-pyle method 
make up the radiograph method. The use of ultrasound is a Atlas and scoring technique are two examples of 
each category. But a thorough explanation of the forensic anthropology method is provided [1]. The study's 
primary goal is to estimate and determine the age and gender of the middle eastern population. This approach 
is implemented using a random forest classification algorithm, which considers 126 wrist radiographs from 
age groups between 6 and 78, totaling 76 male samples and 50 female samples. The current work achieves 
accuracy of 97% [2]. 

Numerous factors, such as gender, diet, metabolic, genetic, and social factors, as well as acute and 
chronic diseases, particularly hormone change, might influence bone age. The many standardised techniques 
created throughout the years can also be used to characterise several differences. Therefore, to effectively 
employ this knowledge for all of its key medical and non-medical fields of application, it is necessary to be 
aware of the full characterization of the main methods and procedures that are available, specifically of all of 
its advantages and disadvantages [3]. A variety of forensic techniques were used to assess the deceased's 
skeletal characteristics, cause of death, and life stature. Conclusion: From a forensic perspective, early sex 
determination by bone analysis is crucial. By measuring the skull, sex can be determined based on 
measurements and characteristics. Male long bones often tend to be longer and more massive than female long 
bones, with more pronounced muscle attachments, having sex evaluation by long bones simpler. Using various 
odontometric procedures, teeth are particularly helpful in identifying gender [4]. 
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Bone age estimation using non-radiation means such as ultrasonography has been theorised, these 
techniques are not as accurate as radiographic techniques for seeing the hand and wrist bones. Although the 
computerized tomography (CT) visualisation of the clavicle has been thoroughly researched, a considerable 
radiation dose is needed. Although more research is needed, magnetic resonance imaging (MRI)-based 
techniques are being developed. A different method of estimating bone age that also provides a skeletal age 
estimate is dental age [5]. 

Separate gender models were constructed considering the observed disparity in growth rates between 
the sexes. Image data for some age categories, such as infancy and very early childhood, are much sparser and 
show a significant departure from other age ranges in terms of morphology. Different region of interest 
definitions was used to train a variety of deep learning architectures. This research was preliminary, and we 
intend to investigate different angles not included in the current study in the future [6]. 

The procedure of estimating the skeletal age of a patient by utilising the age assessment method is 
tough. In the medical field today, computerised approaches are used in place of hand-held procedures to the 
extent that this produces superior evaluation to address these limitations. The research is to minimize issues 
with algorithms and high diagnostic accuracy when dividing up current systems [7]. An innovative Tanner- 
Whitehouse bone age assessment method is proposed by this work. In this method, small yet valuable regions 
are identified, and layers of convolution are employed to characteristics. In order to realise the faster 
computation, this work recommends adopting an algorithm [8]. 

The following are the key findings and accomplishments: i) The grayscale image is pre-processed by 
ImageNet's EfficientNet into a three-channel image. 11) Combining the size and cutting of an X-ray picture of 
the hand bone and decreasing the image area without the hand bone [9]. The goal is to model a system that 
supports for decision making a paediatric bone age assessment using wrist radiographs that will speed 
radiologists’ workflow [10]. Our approach entails networks, where every node and edge stand for a region-of- 
interest (ROI) and its correlation [11]. Describes an approach to extract latent spatial characteristics from 
multimodal magnetic resonance (MR) images, which may enhance the early multiple sclerosis (MS) disease 
identification [12]. 

This paper proposes a novel, comprehensive, deep automated skeletal bone age assessment model 
based on region-based convolutional neural networks (R-CNN) [13]. The procedure, known as BoneXpert, 
uses radiographs of the hand to mechanically recreate the borders of 15 bones. It then determines the "intrinsic" 
bone ages for each of 13 bones (radius, ulna, and 11 short bones [14]. 

We offer a technique for estimating age of a bone from radiographs based on deep learning. It offers 
a quick, deterministic technique for determining bone age [15]. The new trends in this field of study have been 
examined and debated [16]. This study's objective was to assess how storage phosphor plate (SPP) system 
images acquired at various compression settings and image resolutions affected the fractal dimension of 
alveolar bone [17]. A remodelling of a bone funder external loads was simulated [18]. Modern deep learning 
techniques, such as data augmentation, an optimal learning rate finder, and fine-tuning, were employed to train 
the model [19], [20]. The X-ray hand bone image is then automatically evaluated using a convolution neural 
network that has been trained to recognise its features [21]. 

Identification of skeletal remains relies heavily on age determination from bones. We describe an 
unique age estimation technique that was created by applying algorithm to bone images from postmortem 
computed tomography (PMCT) [22]. The purpose of this study was to compare the Greulich-Pyle approach 
against an automated building automation (BA) assessment to determine how accurate and effective it was 
[23]. To create an automatic and manual bone age assessment method based on Greulich and Pyle, and to 
compare it with the deep learning strategy developed based on a training set of developmentally normal 
paediatric hand radiographs (GP) [24]. We use six pre-built, CNNs with weights that have been trained on 
ImageNet. With the use of a transfer learning technique, technique to extract features from preprocessed hand 
photos [25]. CNN architectures like VGG16, DeseNet121, MobileNet, NASNet, Xception, and EfficientNet 
are used in the suggested work [26]. 


3. METHOD 

The vast number of examples and image data made available through the internet was not properly 
utilised by the machine learning techniques used by these systems to enhance categorization performance. In 
this study, we attempt to close the gap between age and gender estimation methods and automatic facial 
recognition technologies. To do this, we use the efficient model established by existing facial recognition 
systems: Recent research has shown that deep convolutional neural networks may significantly improve facial 
recognition algorithms (CNN). A deep learning model takes care of this for the programmer instead of the 
programmer having to explicitly fix the problem when a machine learning model predicts incorrectly. 
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3.1. Data collection 

The datasets are gathered from Kaggle. The dataset is made up of two files, training and testing, which 
together contain over 23,000 photos. In the data, there are 70% of photos are used for training t and 30% of 
images are used for testing. This dataset is 9.28 GB in size. The test folder has 2,000 photos (162 MB). The 
training folder has 10,611 pictures (9.12 GB). Jpeg files are used for the photos. 

Several of the stages taken to create the estimating model include: 
- Import all required layers. 
- Create the required functions for the following blocks: Conv-BatchNorm block, SeparableConv- 

BatchNorm block. 

- Create a function for each of the three flow directions: entry, middle, and exit. 
- Using these functions, construct the entire model. 


3.2. Data preprocessing 

Pre-processing is a stage when the collected data could have null values, missing values, or 
undesirable data that could produce inaccurate findings. Therefore, pre-processing is a crucial stage where 
irrelevant data can be removed to obtain better results. Preprocess the data in order to delete pointless image. 
Divide data sets into training and testing groups. Here, pre-processing is carried out based on the scanning 
image's size, correct shape, and zoom level. Three folders include the transfer of the accessible photos (training, 
validation, testing). 


3.3. Generating sample images 

A selection of the sample photographs is shown following the dataset splitting. Figures 1 and 2 are 
gallery of grayscale pictures of the dataset. The Figure | illustrates the sample for the training the model. As 
shown in Figure 2 displays the sample images for testing the model. The sample images for training and testing 


the model is shown in Figures 1 and 2. 
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Figure 1. Selection of training photos 
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Figure 2. Selection of testing images 


3.4. Xception model 

A 71-layer convolutional neural network is known by the moniker of Xception. A pretrained version 
of the network, which was created using training data from over a million photos, is present in the ImageNet 
database. State-of-the-art accuracy on SVHN and CIFAR10/100 (with or without data augmentation) is 
produced by this connectivity pattern. On the massive RSNA dataset, Xception achieves a reasonable level of 
accuracy while utilising just around half as many parameters and Flops. Figure 3 provides the layered approach 
of the Xception model with the growth rate. 
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Figure 3. Layered Xception model with growth rate 


3.5. Model with Mobile Net 

Depth-wise separable convolutions are used by Mobile Net. When compared to a network with 
standard convolutions of the same depth in the nets, the number of parameters is significantly decreased. Deep 
neural networks that are lightweight are the outcome. When it comes to training our relatively small and 
incredibly speedy classifiers, Mobile net provides us with a terrific starting point. 

Older studies from the 1990s also used to estimate age of a person in addition to more recent 
techniques like the pipeline used in, which presented a combination of biologically inspired features (BIF) and 
then used canonical correlation analysis (CCA) and partial least square (PLS) based methods. The use of BIF 
for face image representation paved the way for further works and demonstrated how closely the automatic 
method had gotten to imitating human performance. Before CNNs, the majority of techniques relied on a two- 
stage pipeline that entailed extracting features like local binary patterns (LBP) and then classifying the data 
with a support vector machine (SVM) or multi-layer perceptron (MLP). 

Age and gender can also be determined using more complex CNN models, but most of them need 
prior knowledge in the area. Deep model cascades were also considered in the database. The pre-trained 
network is capable of classifying photos into 1,000 different object categories, including a variety of animals, 
a keyboard, a mouse, and a pencil. CNNs for facial image processing have been used to ascertain gender, 
authenticate faces, and estimate facial traits in addition to age assessment. For instance, on the difficult labeled 
faces in the raw dataset, the strategy suggested in achieves a stunning 89.2% face verification accuracy. 
Unfortunately, this outstanding performance has not yet been shown for facial analysis applications such as 
gender recognition. 


3.5.1. Feed forward focus to identify people's ages and gender 

The proposed model is made up of three basic modules, as shown in Figure 4: i) CNN path that 
assesses the patches with higher resolution in accordance with their priority predicted by the attention grid; 
ii) a patch CNN that chooses attention grid that is best to perform the glimpses; and iii) MLP, which combines 
the data from both CNNs and completes the classification. 
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An image with reduced resolution is given to CNN, which forecasts a k-kattention grid. The CNN 
then receives the patches, and the attention grid weights its output. An MLP classifier is then given the 
combined feature maps from the streams. Figure 4 illustrates the feed forward mechanism architecture to 
determine the age and gender. The interest CNN receives all the training images. This CNN has been trained 
specifically to forecast an attention spike: 


H(k«*k)e] > 0,Y (1) 


-  k- an arbitrary number. 
-  Hk,-j represent the (nor-malized) importance of each patch. 
Then, a high-resolution version of the input image is divided in IxIpatches and fed to CNN's patch. 

A tactic CNN receives high-quality face patches. Any model, including the attention CNN, might be 
employed However, we utilise the early convolutional layers of the attention CNN for this purpose to lower 
the processing needs of this design. The resultant value is: 


Q(x2 *m) €T (2) 


where x2 is the last convolutional layer's dis output dimension and patch count. 

The spatial dimension of Pto one is then decreased using global average pooling (GAP), allowing 
images to be fed in their standard resolution. The significance of each grid patch is then considered by 
weighting these feature maps using G. We also take into account two methods for integrating the feature maps 
of the two CNNs: i) Learning a projection of the patch CNN feature maps to the attention CNN feature map 
space, and ii) concatenating them after Z normalisation. In this part that follows, we demonstrate how the 
normed concatenation methodology slightly outperformed the project-and-add method in terms of results. 

Figure 5 illustrates the samples of RSNA bone age dataset's fourth fold for age and gender. The 
resultant classifier is typically composed of the dc6, dc7, and dc8 layers of the CNN literature, is then fed the 
generated feature maps. This network's input was a grayscale image of fixed size (224 * 224), demonstrating 
the matrix's form (224,224,3). Using kernels of size (3 * 3) and a stride size of | pixel, they were able to cover 
the entire image. To maintain the image's spatial resolution, spatial padding was applied. 
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Figure 4. Feed forward mechanism 
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Figure 5. RSNA sample bone image (age) 
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3.5.2. Model building 

The Figure 6 illustartes the over all process flow for the determination of age and gender by using the 
bone image data set. It provides the sequence of operation carried out in the research. It illustrates the flowof 
operation as input, modeling, training, and testing. After creating a model, the model is tested with the test 
sample images to predict the age and gender. 


Image Data generation OS Create a Model 


—-—————— 


a > Train and Test 


¢——---- 


Age and Gender Prediction 


Figure 6. Overall process 


4. RESULT AND DISCUSSION 
4.1. Building model for age and gender estimation 

A function that returns a Keras model object of a convolutional neural network has the input 
parameters set as arguments. The fully connected layer, convolutional layer, max layer, and pooling layer make 
up the CNN model. The image that was used has a size of 224.224.3. There are 4 convolutional layers with a 
[32,64,128,128] convolution filter size. The convolutional filter should have the shape [3,3]. ReLU is the 
activation function for the hidden layer, and Softmax is the final activation function. There are 128 buried layer 
neurons. This model uses 40 epochs, a 40-batch size, and a 50-step validation process. 


val_generator = val_data_generator.flow_from_dataframe( 
dataframe = df_valid, 
directory = ‘/content/drive/MyDrive/Project_phase_2/dataset/boneage-training-dataset/boneage-training-dataset', 
xcol = "id", 
y_col = ‘bone_age_z', 
batch_size = 32, 
seed = 42, 
shuffle = True, 
class_mode = ‘raw’, 
flip_vertical = True, 
color_mode = ‘rgb’, 
target_size = (img_size, img_size)) 


#test data generator 
test_data_generator = ImageDataGenerator(preprocessing function = preprocess_input) 


test_generator = test_data_generator.flow_from_directory( 
directory = '/content/drive/MyDrive/Project_phase_2/dataset/boneage-test-dataset/boneage-test-dataset', 
shuffle = True, 
class_mode = None, 
color_mode = ‘rgb’, 
target_size = (img_size,img size)) 


from tensorflow.keras.preprocessing.image import ImageDataGenerator 
from keras.applications.xception import preprocess_input 


img_size = 256 


train_data_generator = ImageDataGenerator(preprocessing function = preprocess_input) 
val_data_generator = ImageDataGenerator(preprocessing_ function = preprocess_input) 


train_generator = train_data_generator.flow_from_dataframe( 
dataframe = df_train, 
directory = '/content/drive/MyDrive/Project_phase_2/dataset/boneage-training-dataset/boneage-training-dataset', 
x_col= ‘id’, 
y_col= ‘bone_age_z', 
batch_size = 32, 
seed = 42, 
shuffle = True, 
class_mode= ‘raw’, 
flip_vertical = True, 
color_mode = 'rgb', 
target_size = (img_size, img_size)) 


Figure 7. Model building 
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In order to determine accuracy, categorical loss, categorical accuracy, validation loss, validation 
accuracy, and multiclass area under the curve (AUC), a model with 40 epochs is shown in Figure 7. Following 
is a list of the three metrics that are displayed to evaluate how well the model performs: 

- loss -> the loss function's value for each epoch. 

- Calculates the frequency with which predictions match one-hot labels. 

- The multiclass AUC function "computes the approximate AUC via a Riemann sum for each label and then 
takes the average. 

The Figure 8 model analysis and evaluation provides the number of females and males considered for 
the determination of age and gender. The sample code provides the sequence of operation for train and testing 
model. The trained network to predict the labels of previously images contained in the test folder. The 
evaluation of loss function, categorical accuracy and categorical AUC of the test dataset be shown below. 


Figures 9 and 10 provides the final image as the result, where the gender and age of the corresponding image 
is obtained. 


[ ] train_df = pd.read_csv("./boneage-training-dataset.csv") 
test_df = pd.read_csv('./boneage-test-dataset.csv') 
train_df[*id’] = train_df[‘id’].apply(lambda x: str(x)+'.png’) 
test_df[‘Case ID'] = test_df['Case ID'].apply(lambda x: str(x)+'.png') 


train_df.head() 


id boneage male 


0 1377.png 180 False 

1 1378.png 12 False 

2 1379.png 94 False 

3 1380.png 120 True 

4 1381.png 82 False 
male 6833 


female 5778 
Name: gender, dtype: int64 
<matplotlib.axes._subplots.AxesSubplot at @x7f78a7c10d3@> 


7000 


female male 
gender 


Figure 8. Model analysis and evaluation 


Image name:14813.png Bone age: 7.0 years Gender: male 


Figure 9. Depicting age and gender 
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5. 


Age: 19.000000Y 
Predicted Age: 12.712466Y 


Figure 10. Final result 


CONCLUSION 
The automatic feature learning of age and gender classification using multimodal approaches has been 


tested using the proposed deep learning method employing convolutional networks. The results of the empirical 
study also showed that pre-processing, subject-separated data partitioning, hyper-parameter selection, and 
dataset size may all have an effect on the final performance of the deep learning classifier. In this study, deep 
learning convolutional neural network models were developed to identify the age and gender of an interactive 
modal more accurately. In the coming work, hybrid techniques will be used, and their performances will be 
examined. 
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